No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
نویسندگان
چکیده
BACKGROUND Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. METHODS The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. RESULTS The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. CONCLUSIONS The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
منابع مشابه
Binary Regression With a Misclassified Response Variable in Diabetes Data
Objectives: The categorical data analysis is very important in statistics and medical sciences. When the binary response variable is misclassified, the results of fitting the model will be biased in estimating adjusted odds ratios. The present study aimed to use a method to detect and correct misclassification error in the response variable of Type 2 Diabetes Mellitus (T2DM), applying binary ...
متن کاملA New Nonlinear Specification of Structural Breaks for Money Demand in Iran
In a structural time series regression model, binary variables have been used to quantify qualitative or categorical quantitative events such as politic and economic structural breaks, regions, age groups and etc. The use of the binary dummy variables is not reasonable because the effect of an event decreases (increases) gradually over time not at once. The simple and basic idea in this paper i...
متن کاملبه کارگیری مدلهای رگرسیون لجستیک ترتیبی در مطالعات کیفیت زندگی
Background & Objectives: Due to the increasing tendency to measure the quality of life in recent years and the extensive quality of life questionnaires, it is important to determine the appropriate method of analyzing data derived from these studies. The aim of the present study was to introduce ordinal logistic regression models as an appropriate method for analyzing the data of quality of li...
متن کاملLogistic Regression Tree Analysis
This chapter describes a tree-structured extension and generalization of the logistic regression method for fitting models to a binary-valued response variable. The technique overcomes a significant disadvantage of logistic regression, which is interpretability of the model in the face of multicollinearity and Simpson’s paradox. Section 1 summarizes the statistical theory underlying the logisti...
متن کاملPhase II logistic profile monitoring
In many industrial and non-industrial applications the quality of a process or product is characterized by a relationship between a response variable and one or more explanatory variables. This relationship is referred to as profile. In the past decade, profile monitoring has been extensively studied under the normal response variable, but it has paid a little attention to the profile with the ...
متن کامل